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Abstract 

Lattice  Boltzmann  algorihms  are  a mesoscopic 
representation  of  nonlinear  continuum  physics  ( like 
Navier-Stokes,  magnetohydrodynamics  (MHD),  Gross- 
Pitaevskii  equations)  which  are  ideal  for  parallel 
supercomputers  because  they  transform  the  difficult 
nonlinear  convective  macroscopic  derivatives  into  purely 
local  moments  of  distribution  functions.  The  macroscopic 
nonlinearities  are  recovered  by  relaxation  distribution 
functions  in  the  collision  operator  whose  dependence  on 
the  macroscopic  velocity  is  algebraically  nonlinear  and 
thus  purely  local.  Unlike  standard  computational  fluid 
dynamics  codes,  there  is  no  loss  in  parallelization  in 
handling  arbitrary  geometric  boundaries,  e.g.,  using 
bounce-back  rules  from  kinetic  theory.  By  encoding 
detailed  balance  into  the  collision  operator  through  the 
introduction  of  discrete  H-function,  the  lattice  Boltzmann 
algorithm  can  be  made  unconditionally  stable  for 
arbitrary  high  Reynolds  numbers.  It  is  shown  that  this 
approach  is  a special  case  of  a quantum  lattice 
Boltzmann  algorithm  that  entangles  local  qubits  through 
unitary  collision  operators  and  which  is  ideally 
parallelized  on  quantum  computer  architectures.  Here 
we  consider  turbulence  simulations  using  2,048  PEs  on  a 
1,600 5 -spatial  grid.  A connection  is  found  between  the 
rate  of  change  of  enstrophy  and  the  onset  of  laminar-to- 
turbulent  flows. 


1.  Introduction 

There  is  much  interest  in  the  defense  community  to 
accurately  determine  turbulent  flows  over  non-trivial 
boundaries  (e.g.,  instabilities  and  wakes  from  naval  ships 
and  aircraft)  as  well  as  intermittent  turbulence  induced  in 
the  upper  atmosphere  under  the  jet  stream  to  optimize 
laser  propagation  from  airborne  platforms.  However,  for 
many  of  these  turbulent  flows,  standard  computational 
fluid  dynamics  (CFD)  codes  (now  necessarily  non-pseudo 
spectral)  quickly  saturate  with  the  number  of  processing 
elements  (PEs)  due  to  the  nonlocal  nonlinear  convective 
derivatives  in  the  Navier-Stokes  equations.  However,  by 
projecting  into  a lattice  kinetic  phase  space,  the  turbulent 
dynamics  in  this  mesoscopic  description  are  simpler  and 
much  easier  to  solve  with  algorithms  that  are  ideally 
parallelized.  The  lattice  kinetic  solution  is  then  projected 
back  into  the  macroscopic  space  by  Chapman-Enksog 
expansions  to  recover  the  fluid  turbulence  solution. 

The  standard  lattice  Boltzmann  (LB)  scheme 
employs  a simple  Bhatnagar-Gross-Krook  (BGK) 
collision  operator,  with  fixed  relaxation  rate.  This  leads 
to  strong  numerical  instabilities  for  high  Reynolds 
number  flows.  However,  by  enforcing  detailed 
balance^12'  into  the  collision  operator,  a generalized 
(entropic)  BGK  operator  is  found  that  will  lead  to 
unconditionally  stable  numerical  algorithm  for  arbitrary 
high  Reynolds  numbers.  Here,  we  test  this  entropic 
lattice  scheme  (which  in  some  sense  can  be  viewed  as  a 
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large  eddy  simulation)  on  extremely  large  grids  and 
determine  some  of  the  turbulence  statistics.  In  Section  2, 
we  give  a brief  overview  of  the  entropic  lattice  Boltzmann 
scheme  and  show  its  connection  to  quantum  lattice 
algorithms131.  In  Section  3 we  present  some  of  our  free 
shear  turbulence  simulation  results  from  the  Capability 
Applications  Project  (CAP)  runs  using  2,048  PEs  on 
l,6003-spatial  grids.  In  Section  4 we  discuss  a new 
turbulence  morphology  gleaned  from  detailed  analysis  of 
simulations  on  5123-grids  and  present  some  concluding 
remarks  in  Section  5. 

2.  Lattice  Boltzmann  Algorithms 


In  the  Q-dimensional  velocity  space,  the  relaxation 
distribution  function  f‘q  is  determined  analytically  by 

extremizing  the  H-function  subject  to  the  local  collisional 
constraints  of  conservation  of  probability  and  probability 
flux.  f*q , considered  as  a vector,  is  the  bisector  of  the 
difference  between  the  incoming  and  outgoing  kinetic 
vectors  in  the  inviscid  limit  lim^o  aJ2z=  2: 


f = r 
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Eliminating  and  /'  from  Eqs.  (4)  and  (1)  one  obtains 
the  LB  equation 


At  each  space-time  grid  point  (x,/)  in  lattice 
algorithms,  the  excited  state  of  a qubit  | q)  encodes  the 

probability  fq  of  the  existence  of  a mesoparticle  moving 
with  discrete  lattice  velocity  cq  - Axq  / At  . A\q  are  the 
lattice  vector  links,  with  q = 1,2,  ....  Q , where  Q is  the 
number  of  qubits  at  each  spatial  node.  The  particle 
momentum  is  determined  from  a suitably  chosen  qubit- 
qubit  interaction  Hamiltonian  H 'while  the  spatial  location 
arises  from  the  free-streaming  Hamiltonian  -ih^\  cq  ■ V . 

9 

All  the  particle-particle  interactions  generated  by  //'(from 
2-body  up  to  Q-body  interactions)  can  be  mapped  onto  a 
local  collision  operator  Q.q  ,/q)  at  x.  In  particular,  for 
type-II  quantum  algorithms,  the  quantum  entanglement  is 
localized  to  those  Q-qubits  at  (x,t)  and  then  this 
entanglement  is  spread  throughout  the  lattice  by  unitary 
streaming13,41: 

/,'(*.')  = /,  (*>')  + U ■••/(,)’/,  (*  + + A')  = /,'(*>')  • (1) 


Here  fq  is  the  incoming  probability  and  fq  the  outgoing 


probability.  In  the  classical  limit,  there  exists  a 
fundamental  discrete  entropy  function11,2,51 

(2> 

9=1 


where  the  normalized  weights 


are 


determined  self-consistently.  The  collision  operator  Qq  in 
Eq.  (1)  is  determined  so  that  one  remains  on  a constant 
entropy  surface 

H(y;'.../')  = H(yi.../e).  (3) 


Eqs.  (1-3)  constitute  the  basics  of  the  detailed-balance 
lattice  algorithms  for  fluid  turbulence  that  are  ideal  for 
parallel  (both  classical  and  quantum)  supercomputers. 


/ (i  + ax  + At)  = j ;(*./)  +— [/’  (x,/)]-/  (*,/),  9 = 1... Q (5) 

2 T 

This  is  basically  the  entropic  LB11,21  with  the  BGK 
collisional  relaxation  parameters  a(x,t)l 2r  and  f‘q 

determined  from  Eqs.  (2)  and  (3).  In  the  Chapman- 
Enksog  limit,  (Ax  — > 0,  At  -»  0) — and  identifying  the 
density  and  momentum  moments 

= P.  ^jCqfq~Pu — one  recovers  the  quasi- 

9 9 

incompressible  Navier-Stokes  equation  with 


effective  viscosity:  //(x,/)  = — 


4r 


a(x,t) 


-1 


(6) 


molecular  viscosity:  /i0=— (2r-l),  r>0.5 
6 


To  avoid  discrete  lattice  geometry  effects  polluting  the 
turbulence  simulations,  one  is  restricted  to  certain  Q ’s  on 
a cubic  lattice.  In  particular  it  can  be  shown  that  on  a unit 
cubic  lattice,  the  lowest  order  kinetic  velocity  models  are 

Q15:  rest  velocity,  speed  1 (6  velocities),  speed  >/3  (8 
velocities)  - i.e.,  0=15 

Q19:  rest  velocity,  speed  1 (6  velocities),  speed  n/2  (12 
velocities)  - i.e.,  0=19 

Q27:  rest  velocity,  speed  1 (6),  speed  \fl  (12),  and  speed 
x/3  (8) -i.e.,  0 = 27  (7) 

Because  detailed  balance  is  in-built  into  the  entropic  LB 
algorithm  [see  Eq.  (3)],  the  scheme  is  unconditionally 
stable  for  arbitrary  large  Reynolds  numbers,  Re  = 

UoL/2/c/Jo- 


3.  CAP  Parallelization  and  Simulation 
Results 

Since  entropic  LB  consists  of  local  collisional 
relaxation  and  simple  shifting  of  data  along  the  lattice 
links,  it  is  ideally  parallelized.  In  the  CAP -Phase  I run, 
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we  investigated  the  scaling  of  our  entropic  lattice 
Boltzmann  (ELB)-Q27  algorithm  on  the  Naval 
Oceanographic  Office  (NAVO)  IBM-P5  (Babbage)  all  the 
way  up  to  the  full  2,912  PEs  available,  achieving  over  6.3 
teraflops  (TFlops)/s  sustained  performance  (see  Figure  1). 
This  is  close  to  perfect  scaling  with  the  number  of  PEs. 
The  MHD-LB  code  has  been  run  on  the  Earth  Simulator 
on  4,800  PEs,  achieving  a sustained  timing  of  over  26 
TFlops/s  (which  is  67%  of  peak  on  this  vector 
machine) — the  best  performance  of  a scientific  code  on 
that  supercomputer  to  date.  On  Blue  Gene  this  code  has 
been  run  on  32,768  PEs  and  achieving  over  9.1  TFlops/s 
(23%  of  peak  on  this  #PEs  scalar  machine),  again  with 
excellent  scaling. 

Table  1.  GFIops/s  per  CPU  for  2912  CPUs  for  2,000 
time-steps  for  the  3 ELB-codes 


Wallclock  GFIops/s 


#PEs 

Grid 

Model 

(s) 

per  PE 

2912 

ca1950J 

ELB-Q27 

7 554.7 

2.17 

2912 

ca19503 

ELB-Q19 

5 602.7 

2.24 

2912 

ca19503 

ELB-Q15 

4 798.4 

2.05 

In  Table  1 we  show  the  wallclock  time  and  average 
performance  of  the  various  ELB  models  for  the  full  2,912 
PEs  available  for  2,000  LB  time-steps.  The  Q27  model, 
based  on  the  27  kinetic  streaming  vectors,  is  the  most 
memory  intensive  (about  1 KB/grid  point)  and  requires  a 
wallclock  time  which  is  over  1.5  times  that  required  by 
the  Q 15  model  (which  requires  just  0.5  KB/grid  point). 

For  CAP  Phase  II,  we  wished  to  investigate  the  role 
of  the  underlying  kinetic  lattice  symmetry  on  Navier- 
Stokes  turbulence,  since  all  three  ELB-algorithms  recover 
the  Navier-Stokes  equations  to  leading  order  in  the 
Chapman-Enskog  expansion.  This  is  particularly 
important  since  on  small  grids  (e.g.,  5123)  and  low 
molecular  viscosities  (juq  = 2xl0~4)  we[4]  had  found  very 
minor  differences  in  the  simulation  results  from  the  Q27, 
Q19,  and  Q15  models.  With  2,048  PEs  available  for  24 
hour  shifts,  the  maximal  spatial  grid  for  the  Q27 
algorithm  was  1,6003.  All  three  models  were  run  with  the 
same  base  parameters:  u0  = 0.035,  = 5*  10~4  on  the 

l,6003-grid  (i.e.,  with  a base  Re  = uoLUn^  ~ 18,000  and 
computational  resolution/grid  spacing  Re3/4/L  = 1)  for  a 
Kida  initial  velocity  profile161  with  delta-function  energy 
spectra.  In  Figure  2,  we  plot  the  normalized  kinetic 

energy  < |u(x,/)|2  > / <|u(x,0)|2  > , the  normalized 

enstrophy  < |ca(x,/)|2  > / < |a>(x,0)|2  > , palinstrophy 

2 P{t ) =<  |V  x <b|2  > , where  the  vorticity  to  = V x u,  and 
< ..  > represent  volume  average  over  the  periodic  domain. 
The  ELB-dissipation  rate  zftjis  defined  by 


*(0  = 2/V(0(V*-)’where  Sy  is  the  usual  rate  of  strain 

tensor  and  the  effective  relaxation  rate  (to  make  an 
analogy  with  standard  LB  algorithms) 
Ae/?(0=(<4  r/a(\,t)>- 1 )/6.  Clearly  the  Q 19-results 
significantly  deviates  even  qualitatively  from  the  Q27- 
and  Q 15-results,  while  there  is  strong  quantitative 
agreement  between  the  Q27-  and  Q15-  models  (up  to  a 
simple  rescaling).  This  contrasts  strongly  with  a low- 
resolution  grid  run  on  5 1 23  at  a somewhat  lower 
molecular  viscosity,  see  Figure  2(f).  It  appears  that  these 
differences  arise  from  the  Newton-Raphson  root  finder 
that  determines  at  each  grid  point  and  at  each  time  the 
a(x,t)  function  that  enforces  detailed  balance  on  the 
constant  entropic  surface,  Eq.  (3).  These  functions  a(x,t) 
seem  to  be  much  more  lattice  dependent,  i.e.,  whether 
Q27,  Q19  or  Q15,  than  would  have  been  gleaned  from 
small  grid  runs.  In  Figure  3,  we  plot  the  development  of 
the  longitudinal  and  transverse  ID  energy  spectra: 

^(m=LMm2|, ^(*,o=ZMk’')2|  w 

ky  >*.  ky  ’k, 

for  the  initial  Kida  velocity  profile  with  initial  delta 
function  spectra 

(*  , 0)  = £<?(*  -2),  and  £_  (k  , 0)  = £ |>(*  -2)  + g(k  -4>]  (9) 

While  the  terabytes  of  data  from  the  early  stages  (t  < 28K) 
of  the  Q27-run  are  being  retrieved  and  analyzed,  some  of 
the  data  from  the  t > 28K  has  been  analyzed.  The  energy 
spectra  approximately  obey  a k~5,i  Kolmogorov  law,  with 
a slight  upturn  at  the  very  large  kx  in  E,ong , indicating  that 
the  run  is  slightly  unresolved  at  these  scales. 

The  probability  distribution  functions  (pdfs)  for  the 
velocity  and  vorticity  components  are  shown  in  Figure  4. 
The  velocity  field  is  basically  Gaussian — but  with  tails 
that  are  substantially  higher  than  a Guassian.  These  tails 
die  out  with  time  as  seen  by  the  plot  of  P[ vx]  at  t = 29K 

(Figure  4a)  and  at  t = 41K  (Figure  4b).  The  pdfs  for  the 
other  velocity  components  have  very  similar  behavior. 
On  the  other  hand,  the  vorticity  pdf  is  well  fitted  by  an 
exponential  pdf.  This  is  indicative  of  intermittency  in  the 
turbulence. 

4.  Turbulence  Morphology  for  Free  Shear 
Turbulence 

There  is  a correlation  between  the  onset  of  turbulence 
in  a laminar-to-turbulence  transition  and  the  order- 
disorder  phase  transition  in  ferromagnetism.  Just  as  Ising 
lattice  models  are  fundamental  to  understanding  critical 
phenomena,  kinetic  lattice  gas  models  that  we  are 
pursuing  could  have  a similar  impact.  We  now  give  some 
preliminary  results  on  the  turbulence  morphology  from 
5123  grid  runs.  The  morphology  can  be  broken  down  into 
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three  main  stages,  Figure  5.  Stage  1 occurs  in  the  initial 
time  interval  0 < t < 3.2K  with  the  enstrophy  Q(/) 
increases  exponentially,  independent  of  the  viscosity. 
The  enstrophy  curve  is  plotted  in  Figure  5 with  the  integer 
dots  ‘1\  ‘2’  ...  ‘7’ — and  these  integers  correspond  to  the 
isosurfaces  of  constant  vorticity  at  t = IK,  t = 2K  ...  t = 
7K  in  Figure  5.  The  color  coding  is  based  on  the  value  of 
u ■ <» : grey  corresponds  to  u • <b  = -,  blue  for  u • a = +1 
and  red  for  u • cb  = -1 . In  this  initial  stage,  the  isosurfaces 
of  vorticity  are  stretched  with  a sharp  rise  in  dO/dt  (the 
sharply  rising  curve  above  the  enstrophy  curve  in  Stage 
1).  In  Stage  2,  for  time  3.2K  < t < 9K,  shown  shaded  in 
Figure  5,  there  is  large  scale  anisotropic  turbulence  with 
intermittency.  In  this  shaded  region  dDJdt  becomes 
jagged  and  predominantly  is  decreasing  in  large  spurts 
with  intermediate  avalanches  occurring  at  t = 5.  IK,  and 
6.75K  (vertical  red  lines  in  Stage  2 of  Figure  5).  Stage  3, 
for  9K  < t < 14K,  is  the  inertial  subrange  with  eventual 
exponential  decay  of  the  enstrophy  (see  curve  fitted  red 
line  that  fits  fi(?)  well  for  t > 10K).  In  this  Stage  3,  we 
see  the  onset  of  homogeneous  isotropic  small  scale 
turbulence  with  energy  cascading  to  small  scales  leading 
to  the  F5/3  Kolmogorov  energy  spectrum.  The  velocity 
pdf  is  Gaussian  while  the  vorticity  pdf  is  exponential  (see 
the  inset  plots  in  Figure  5). 

5.  Concluding  Remarks 

Using  CAP  resources  we  have  been  able  to  uncover 
lattice  geometry  effects  in  the  entropic  lattice  Boltzmann 
algorithm  that  had  not  been  expected  from  lower  grid 
resolution  runs.  In  the  entropic  formulation,  one  is 
working  with  a generalized  BGK  collision  operator  that 
has  within  it  the  germs  of  detailed  balance.  Thus,  the 
unconditionally  stable  algorithm  is  achieved  with  a 
variable  transport  coefficient,  not  unlike  Large  Eddy 
Simulations  (LES)  in  CFD.  Indeed,  we  have  explored 
this  connection  in  some  detail  but  will  report  those 
findings  elsewhere  due  to  space  limitations  here.  Another 
unexpected  result  unearthed  by  the  CAP  runs  was  the 
dependence  of  the  ELB  on  the  Mach  number.  A low 
Mach  number  expansion  has  to  be  performed  to 
analytically  evaluate  the  Lagrange  multipliers  arising  in 
the  extremization  of  the  H-function  subject  to  local 
collisional  constraints.  We  have  found  that  the  Q 15-bit 
model  is  less  sensitive  to  the  flow  Mach  number  than  the 
Q27-bit  model.  Another  somewhat  unexpected  finding 
was  the  importance  of  maintaining  the  distribution 
function  correlations  in  the  mesoscopic  description.  To 
perform  the  long-time  l,6003-grid  runs  we  needed  to 
perform  continuation  runs.  In  the  early  stages  of  CAP  we 
tried  to  minimize  the  amount  of  input/output  read- 
out/read-in and  to  reconstruct  the  relaxation  distribution 
function  from  its  moments  rather  than  keeping  the  full 


correlation  information.  While  this  did  not  affect  the 
energy  decay,  there  were  significant  discontinuities 
introduced  into  the  enstrophy  and  higher  energy  spectral 
moments. 

The  parallelization  strength  of  ELB  arises  from  the 
modeling  of  the  macroscopic  nonlinear  derivatives  by 
local  moments.  Chapman-Enskog  asymptotics  will  then, 
on  projecting  back  into  physical  space,  yield  these 
nonlinear  derivatives.  Indeed,  this  will  allow  ideally 
parallelized  Smagorinsky  type  LES  to  be  modeled  by  LB 
methods  and  in  LB-MHD  algorithms  enforce 
automatically  V • B = 0 without  the  recourse  to  expensive 
divergence  cleaning  algorithms. 

The  interconnection  between  quantum  algorithms 
that  can  run  on  quantum  (and  classical)  computers  and 
ELB  (that  can  only  run  on  classical  computers)  has  been 
outlined  as  well  as  a new  morphology  of  free  shear 
turbulence  and  the  onset  of  laminar-to-turbulence 
transition.  The  analogy  between 

Order-disorder  phase  ..........  ..  , . 

. ...  (Lattice)  Ising  Model 

transition  v ' b 

laminar-turbulence  ^ Entropic  Lattice 

fluid  transition  Boltzmann  Model 

is  being  strongly  pursued. 
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Figurel.  Scaling  of  ELB-Q27  on  Babbage,  showing  a near 
perfect  scaling  with  CPUs 
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Figure  2.  The  evolution  of  the  normalized  (a)  kinetic  energy 
K(f)/K(0),  (b)  enstrophy  n(t)/ 0(0)  ,(c)  palinstrophy  P(f),  (d) 
dissipation  rate  s(t),  (e)  re/t  for  the  Q27,  Q19,  and  <315 
algorithms  on  1,6003-grid  with  (f)  normalized  enstrophy  from 
a 5123-simulation  at  somewhat  lower  molecular  viscosity 


Figure  3(a).  The  longitudinal  energy  spectrum,  (b)  the 
transverse  energy  spectrum  for  t > 28K 


Figure  4.  The  pdf  for  the  velocity  component  vx  at  (a)  t = 29K, 
and  (b)  t = 41 K,  fitted  to  Gaussian  pdfs.  The  pdf  for  the 
vorticity  component  o>x  at  t = 40K  is  shown  in  (c),  fitted  to  an 
exponential  pdf. 
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Figure  5.  Surfaces  of  constant  vorticity  from  t = OK  to  t = 7K, 
color  coded  by  u * e>  [grey  = 0,  blue  = +1 , red  = -1].  Stage  1 : 
0<  t < 3.2K.  Vortex  stretching,  with  an  exponential  growth  in 
the  enstrophy  n(f)  - the  1....7  corresponds  to  vorticity 
isosurface  plots.  Stage  2:  3.2K  < t < 9K.  Large  scale 
anisotropic  turbulence  and  intermittency  occur.  The  major 
breaking  points  are  at  3.2K,  5.1  K and  6.8K  (red  lines  in 
enstrophy  plot).  Stage  3:  t > 9K.  Inertial  subrange  with 
homogeneous  isotropic  small  scale  turbulence  with  k"5'3 
Kolmogorov  energy. 
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